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ABSTRACT 

We present a statistical method for the photometric search of rare astronomical 
sources based on the weighted fe-NN method. A metric is defined in a multi-dimensional 
color-magnitude space based only on the photometric properties of template sources and 
the photometric uncertainties of both templates and data, without the need to define 
ad-hoc color and magnitude cuts which could bias the search. The metric is defined as 
a function of two parameters, the number of neighbors k and a threshold distance D^h 
that can be optimized for maximum selection efficiency and completeness. We apply the 
method to the search of L and T dwarfs in the Spitzer Extragalactic First Look Survey 
and the Bootes field of the Spitzer Shallow Survey, as well as to the search of sub-stellar 
mass companions around nearby stars. With high level of completeness, we confirm the 
absence of late-T dwarfs detected in at least two bands in the First Look Survey, and 



only one in the Shallow Survey (previously discovered by lStern et al.ll2007l ). This result 
is in agreement with the expected statistics for late-T dwarfs. One L/early-T candidate 
is found in the First Look Survey, and 3 in the Shallow Surveys, currently undergoing 
follow-up spectroscopic verification. Finally, we discuss the potential for brown dwarf 
searches with this method in the Spitzer warm mission Exploration Science programs. 
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Introduction 



One of the most common problems in astrophysics is the classification of astronomical sources 
based on their colors and magnitudes in a given photometric system. When high spectral resolution 
data (or a large number of photometric bands covering the source SED) are available, this classifi- 
cation, and the extraction of the physical parameters of the source, is generally achieved by fitting 
the observations with an appropriate physical model. In many instances, however, either a reliable 
model is not available, or the number of photometric bands is not sufficient to provide a robust 
source classification. When data fitting is not possible, a common fallback solution is to infer the 
nature of the sources to be classified by their proximity to "regions" in meaningful color-color and 
color-magnitude diagrams, where the sources of a certain class are expected to be found. 

These regions are in turn defined on the basis of generic physical considerations (e.g. stars 
burning H in their cores are located on the region of the Herzsprung-Russell diagram we call 
Main Sequence) or by association with other sources of the same class. A typical example of 
this approach in the early yea rs of infrared space astronomy were the IRAS color-color diagrams 
( van der Veen &: Habingjll988l ) aimed to automatically classify the 2.4 • 10^ sources found by the 
InfraRed Astronomical Satellite in its bands at 12, 25, 60 and 100 //m. The diagrams were created 
by deriving the IRAS colors of ~ 5, 4 00 sources whose nature could be inferred by the properties of 
their IRAS Low Resolution Spectra (|LRSlll986l l. The resulting diagrams were a grid of polygonal 
regions where sources with specific properties (stars, circumstellar envelopes with varying degrees 
of optical thickness, planetary nebute and other infrared sources) were expected to be found. As 
is common in these cases, the boundaries between the regions were defined arbitrarily by using a 
convenient geometrical pattern bisecting known "template" sources used for building the diagrams. 
Most importantly, these regions did not have an associated statistical meaning, e.g. it was not 
possible to quantify how complete and effective was the source classification provided by these 
regions. 

Other branches of science, however, have developed statistically valid techniques to attack this 
kind of unstructured classification problems, where detailed k nowledge of a model is not required, 



or not available. The k-Nearest Neighbors (/c-NN) method (jFix &: Hodged Il95ll ). in particular 



has been succesfully used as an efficient "black box" predictor for problems of pattern recognition 
and unsupervised m achine learning, in fields ranging from comput erized handwriting recognition 



(jSiniard et al.lll993l 'l to automatic classification of satellite imagery (iMichie. Spiegelhalter &: Tayloi 
I994I ). to medical imaging and diagnostics. 



In astronomy, k-Nearest Neighbors methods have bee n traditionally used to st udy clustering 

), by analyzing 



in the spatial distribution of astronomical sources (see e.g. iBahcall Sz Soneira 



1981 



the statistical distribution of the distances, on the plane of the sky or in the 3-dimensional space, 
between each source and its nearest neighbors. Alternatively, the meth od has been the base of 
regression techniques for parameter fitting (e.g. photometric redshifts, see lBall et al.l 120071 ). In this 
paper, we will instead apply the /c-NN method in its role of nonparametric classifier, where the 
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class of a new set of data is decided based on its distance from a class of "templates", and where 
the distance is defined in a multi-dimensional color and magnitude space. Our implementation of 
the method is specifically tuned to the search of rare sources hidden in a large catalog. 

To illustrate the effectiveness of the method, we apply our tech nique to the search of brown 
dwarfs wit h the InfraRed Ar ray Camera (I RAC. iFazio et al.l l2004l ) onboard the Spitzer Space 



Telescope ( Werner et al. 2004 ). As shown by Patten et al. ( 20od ). brown dwarfs have unique col 



ors in the near-IR and IRAC 3.6, 4.5, 5.8 and 8.0 jim bands, due to the p r esence of prorninent 
molecular features s uch as CH4, H2O, NH3 and CO (jOppenheimer et al.lll998l : ICushing et al.ll2005l : 
Roelling et al.l |200J) in the wavelength range covered by the camera (see Figure [1|) . These colors 
provide a powerful discriminant to identify brown dwarfs within the large photometric catalogs 
that have been produced during the Spitzer cryogenic mission. The fc-NN method is particularly 
suited for this search, because of its high efficiency in finding "needles in the haystack" such as 
brown dwarfs, among the galactic general population and the extragalactic background. 

The rnethod is first applied using data from the Spitzer Extragalactic Fir st Look Survey (XFLS, 
Lacv et al.l |2005| ) and the Spitzer Shallow Survey (jEisenhardt et al.l |200J) , which are combined 
with ground based optical and near-IR surveys for further refinement of the candidate sample. The 
parameter space of the possible color combinations and A;-NN parameters is explored in order to 
provide and quantify the best possible search completeness and efficiency. Searches using only the 
two IRAC bands at 3.6 and 4.5 ^m are also investigated, to assess the possibility of brown dwarf 
detection using only the two channels that will be available during the post-cryogenic Spitzer warm 



mission. 



Section[2]of the paper describes our implementation of the /c-NN method, which is then applied 
in Section [3] to search for field brown dwarfs in the XFLS and Shallow surveys. In Section H] the 
fc-NN method is used to estimate the efficiency and completeness of IRAC photometric searches 
of brown dwarf companions around nearby stars. In Section [5] we summarize the results of these 
searches, and discuss other possible applications of the method. 



The /c-NN Method 



In a typical application of the fc-NN method, as described bv iHechenbichler Sz SchliepI (|2004l ). 
the class of a test element is selected by a majority vote of its k nearest templates (where k 
is optimized to the specific problem at hand). Template objects for each class, (i.e the training 
sample) are distributed in the multi-dimensional space defined by the variables used in the analysis. 
The class of the test element is determined by the prevalent class of the k templates that, according 
to a given metric, are closer to the test element. Note that in this case the choice of the metric is 
only important for the selection of the k nearest templates, after which the decision is determined 
by the rule adopted to weight the "votes" of these k nearest templates. 



The situation is however different in the classical astronomical setting, where sources of one 
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specific class are searched among a larger set belonging to many other classes, which are generally 
not specified. In this case, a majority vote is not possible and a different approach is required. In 
order to address this challenge, we have developed a novel implementation of the A;-NN method 
tuned for the search of rare sources in astronomical photometric catalogs. Our application requires 
templates only for the "search class" , relying on the assumption that the templates are an accurate 
representation of the class, and that the selection variables (colors and magnitudes) chosen in 
the analysis are sufficient to provide an effective discrimination. The only quantitative criterion 
available for the selection, and used to determine its statistical validity, will be the final A:-NN 
distance of each source in the input catalog, according to the chosen method. As an example, in 
section [3] we show the application of our method to the search of brown dwarfs (the signal) in a 
catalog where most of the other sources are galaxies or regular stars in different evolutionary stages 
(the background). 



In any A;-NN application the choice of the metric is arbitrary, and is ultimately determined 
only on the basis of its ability to effectively separate the signal from the background. If the analysis 
involves separate variables (in this case colors or absolute magnitudes), a common choice is the 
euclidean distance in the iVi/i-dimensional space, with the individual distances in each variable 
averaged (or summed) on the number of dimensions: 



where d{i,j) is the distance between the source i and the template j with respect to the variable I. 
The effect of averaging over the dimensionality N is illustrated in Figure [2) the average increases 
the size of the ellipse enclosing test sources within a given distance from each template. If a source 
has a unitary distance from a template along each variable, the distance will be larger than one in 
the case of a pure euclidean metric, but still equal to one in the case of averaged euclidean metric. 
The latter is preferrable for multi-dimensional spaces where is not desirable to have a metric that 
tends to become larger as the dimensionality increases: in other words, it is a good choice 
to have a distance close to unity if the individual components of such distance in each variable 
are all around unity. For this reason we have adopted the averaged euclidean distance in this 
implementation of the fe-NN method. Furtermore, another advantage of averaging is that it allows 
to weight differently the individual components of the metric, in case some of the variables (colors 
or magnitudes) have a stronger discriminatory property for the problem at hand. However, in the 
applications presented here we assume for simplicity that all the variables are equally important, 
and no weighting is necessary. 



2.1. 



The A;-NN Metric 




euclidean distance 



averaged euclidean distance 



(1) 



- 5 - 



To ensure that the distances along each variable play an equal role in the final metric, a normal- 
ization is required. The metric should take into account, for example, if one color or magnitude has 
larger uncertainty than the others. Thus we divide each distance by its associated total uncertainty: 



di(^^^^ = ^%^ ; Mi,j) = ^{^i^ + ^sU) (2) 

where and aj are the statistical uncertainties of the data and templates respectively (e.g. their 
3(7 photometric error) in the variable /, and where as is a measure of the non-Gaussian systematic 
errors of the template j (explained below), also in the variable I. 

The final fc-NN distance of a source i is then the (weighted) average of the distances to the 
nearest k templates (the optimal number of neighbors k is determined with the techniques described 
in Section [22]) : 

ELiD{i,j)-w{i,j) 
DkNNii) = „fc -— (3) 

The weights w{i,j) are introduced to reduce the influence of isolated templates that happen 
to be much farther away than the other nearest neighbors. A Gaussian kernel is very effective for 
this task: 



w{i,j) = exp 



-D{i,j) 



(4) 



Note that the Gaussian kernel is parametrized with the geometric average (on all variables I) of 
the same normalization factor cr(i,j) adopted for the individual distances. This is again necessary 
to scale the range of the kernel proportionally to the accuracy of the individual templates. The 
extra factor k plays the role of reducing the effectiveness of the kernel as the number of neighbors 
increases, which is the intended goal of using a large value for k. 

The effect of this normalization on the fc-NN distance is shown in Figure [3l The contours 
traced around the template sources enclose the areas within a given /c-NN distance from the training 
sample. The meaning of this region becomes obvious in the case of A; = 1 (solid line). Thanks to 
the normalization with the total uncertainty a{i,j), the region with a radius D^nn = 1 is nothing 
else than the union of the error ellipses around the templates. A test source within the region 
will have a separation from the template class which is less than the uncertainty in the data and 
the templates, and will thus likely be a member of the template class. A source with 



kNN 



> 1 

will instead have a greater probability of not being a member, and should be rejected. Note that 
for k = 1 the border of the region closely follows the location of the templates, deviating from a 
smooth line because of the scatter in the distribution of the training sample. A larger value of k, on 
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the other end, wih average on the position of the individual templates, thus producing a smoother 
contour, albeit at the risk of excluding isolated templates from the region. The best value of k will 
be a compromise between these two different tendencies, and needs to be determined case by case, 
with the parameter optimization techniques explained in the following section. 

The solid line in Figure [3] was derived without adding the non-Gaussian systematic error dg in 
the templates. As a result, for any given fc-NN distance, the size of the region is determined only by 
the statistical errors in data and templates. A direct consequence of this, however, is that templates 
that are separated by a distance larger than their statistical error will produce separated regions. 
This is not desiderable, as in most astronomical applications the location of the templates is only 
partially in control of the astronomer. Due to low statistics it depends on the chance location, 
in the color-color and color-magnitude space, of the template sources that have been possible to 
observe. This is particularly true for the search of rare sources, where only a small number of 
templates is generally available. The sparseness of the templates in this case may lead to a serious 
problem, as regions of the color-magnitude space where sources of the template class may exist 
could be excluded only because no templates have been observed there at the time the training 
sample was assembled. To correct this issue we introduce in our A:-NN metric the sparseness factor 
(Ts, which is a measure of how far apart the templates are with respect to each variable I: 



where crs{j) is defined as the average distance of the template j from the remaining k nearest 
templates. The dotted line (for A; = 1) in Figure [3] shows how the introduction of the sparseness 
factor as, acknowledging the inadequacy of the template distribution in a region of the parameter 
space, is able to reconnect the region despite the lack of templates between the two sets that are 
separated in the solid line region. Particular caution, however, should be taken while introducing 
the (JsU) defined as in equation [5] in cases where a gap in the templates distribution is actually 
expected in the data (e.g. the gap in the Horizontal Branch for He burning giants in the Herzsprung- 
Russell diagram). In such cases, the presence of the gap can only be noticed, and is statistically 
significant, when it is sampled by a large number of templates on both sides of the gap. If k is 
chosen to be much smaller than the number of templates in the gap region, the gap will still be 
preserved as no template across the gap will be among the k nearest neighbors in equation [SJ and 
the sparseness factor will be smaller than the width of the gap. 




(5) 



2.2. Application of the Method 



The /c-NN metric we have defined in Eq. [3] is ideally suited for selecting rare astronomical 
sources based on their spectral properties from a large photometric catalog. 
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The first step in applying the method is the identification of the variables to use. These 
variables can be any combination of colors or magnitudes. Unless the number of available bands 
is so small that all the possible combinations can be tested, the best course of action is to choose 
the variables that, based on astrophysical considerations, provide the best discrimination (e.g. 
colors sensitive to peculiar spectral features). Caution must be used to avoid choosing too many 
variables: even though one would naively assume that more variables would produce a better 
sele ction, this is not a lways the case and can lead to the so-called "curse of dimensionality" (see 



e.g 



Hastie et al.ll2003l ). Also variables that do not carry a significant discriminating role should be 
avoided, because they would dilute the effectiveness of the metric by averaging out more effective 
variables. Having multiple variables sensitive to the same physical property can also be counter- 
productive, as it biases the metric towards this one physical characteristic, at the expense of other 
equally or more important discriminants. A solution to this issue is to either to avoid adding 
variables not contributing any original discriminant, or to reduce their influence by fine-tuning the 
/c-NN metric with appropriate weights. 

Once the variables are chosen, the DkNN distance of each source in the catalog can be de- 
termined. The selection is done by comparing -DfcA^Af with a threshold value D^h above which the 
sources are rejected. For maximum effectiveness the number of neighbors k and the threshold dis- 
tance Dtfi have to be optimized for the problem at hand. The goal of this optimization is to select 
the smallest possible number of candidates to follow-up, while preserving the completeness of the 
search. In this context, we define the completeness C as the fraction of the objects that are found, 
with respect to their expected number. In addition, we define the rejection efficiency £ as the 
fraction of the background objects that are successfully rejected. An ideal search will have 100% 
completeness (all sources are found) and 100% rejection efficiency (all the returned candidates are 
genuine). In practice both fractions will be lower than 1, and the search parameters should be 
optimized to provide maximum possible rejection efficiency an d completeness. T his optimization 



can be done using either the jackknife or the bootstrap methods (jHastie et al.ll2003l ). Both methods 
attempt to estimate the statistical distribution of C{k,Dtfi) and £{k,Dtfi) to determine the best 
values of k and Dt^. 

The jackknife method tests the minimum distance for which the templates are an homogeneous 
and contiguous set. Given a certain k, one measures the fe-NN distance of each template from the 
remaining n — 1 (leave-one-out method). Once this is done for all templates, the completeness 
is derived as the fraction of templates that are within any given threshold distance D^h- While 
this method is relatively straightforward, it may be very inaccurate for the search of rare sources, 
since the number of available templates is often small and does not cover uniformly the color and 
magnitude distribution of the target objects. Thus we adopt the bootstrap method. 

With the bootstrap method we evaluate the completeness by creating an artificial sample 
with the characteristics of the templates. This artificial sample is then tested with the /c-NN 
method to estimate how many of these simulated sources are found within a distance Dth- The 
test sample is generated by varying the template colors and magnitudes, adding a random offset 
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using the statistical and systematic errors in the templates. The statistical errors are introduced 
by adding a Gaussian error equal to the statistical uncertainty of the templates {aj in equation [2]) . 
The systematic errors are instead simulated by adding a random uniform component equal to the 
amplitude of the template sparseness factor ag defined in equation [5l With this method it is possible 
to create an arbitrary number of simulated signal and background samples of any size, enabling 
the study of the statistical distribution of C{k, Dth) even when only a small number of templates is 
available. 

The rejection efficiency 8 is similarly evaluated with the bootstrap method. If a "pure" back- 
ground sample can be extracted from the catalog, then it is just a matter of counting the fraction 
of catalog sources that are rejected with any combination of k and L'th- In most cases a pure back- 
ground sample is however not available but, if the objects in the search class are rare, any small 
sub-sample of the full catalog can be assumed to have a very small contamination of template-class 
sources. These sub-samples can be randomized introducing Gaussian noise equal to the data un- 
certainty (Tj defined in equation [2l Sub-samples that by chance have a larger level of contamination 
will appear as outliers, and can be removed from the final distribution of 8{k, Dth)- 

Once the distribution of C and £ is known, the characteristics of the selection problem de- 
termine the value of k and Dth to choose. If the search needs to be very selective because a 
large number of follow-up observations is not affordable, then the completeness can be sacrificed 
(generally by using a small value of Dth) in favor of high £. On the contrary, when completeness 
is paramount, a larger threshold distance can be adopted even though it will result in a large 
number of candidates. As an intermediate solution we adopt the value of Dth and k for which 
£{k,Dth) = C{k,Dth) is the highest. 

After the /c-NN metric is applied with the optimized parameters, all sources within the thresh- 
old distance Dth should be considered as candidates. These candidate sources can be foUowed-up 
by applying further selection criteria (e.g. applying color cuts that could not be included in the 
A;-NN metric, or by executing new targeted observations). If the selected sample is still too big 
for a follow-up program, it can be helpful to run the fc-NN method a second time on the first-run 
selection, using a different set of variables. This is particularly effective if there is a concern that 
some variables have yielded a lower selection efficiency than expected, due to having been averaged 
out in the metric by other variables. In this case it may be just more efficient to re-apply the 
A;-NN method using only these variables, starting from the sources selected in the first run, rather 
than trying to improve the efficiency of the first /c-NN run by fine tuning the weights between the 
variables. 

3. Searching Brown Dwarfs in Spitzer Wide Field Surveys 

As an application of the method, we present the search of brown dwarfs in Spitzer surveys. 
Brown dwarfs represent the link between main sequence stars, fully supported by H burning in their 
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cores, and planets. Given our poor understanding of the lower end in the stellar mass distribution, 
an accurate census of the galactic population of brown dwarfs is of paramount importance to 
constrain models of star formation and galactic evolution, and to provide an accurate measurement 
of the stellar mass in the Galaxy. 

Due to their low luminosity and red colors, bro wn dwarfs are difficult to f ind. The first unam- 
biguous identification of a brown dwarf, Gliese 229B fjOppenheimer et al.lll995l ). came only 20 years 
after the class was introduced by Jill Tarter in her Ph.D. thesis. In recent years, ho wever, the avail- 



abilit y of deep wide area surveys such as the Two Micron All Sky Surve y (2MASS. pkrutskie et al 



20061 ). the Deep Near Infrared S urvey of the Sou thern Sky (DENIS, lEpchtein et al.Ml999l ). the 
Sloan Digital Sky Survey (SDSS, lYork et al.l boool] and the UKIRT (UK Infrared Telescope) In- 
frared Deep Sky Survey (UKIDSS. iLawrence et al.ll2007l) allowed to identify an increasing number 



of brown dwarfs in t he solar neighborho o d (see e. g. ^Kirk patrick et al 



Leggett et al 



Chiu et al 



200' 



20021: 



Geballe et al 



Cruz et al 



2007 



2002: 



Burgasser 20 4: iKnapp et a 



Looper et al 



2007 



200C- 



Pinfield et al 



Burgasser et al 



200 



200; 



Tinney et al 



200 



1 



2005: 



However, the total 

number of brown dwarfs known to date (~ 556 of the red, dusty L dwarfs, and ~ 148 of the cooler, 
methane rich T dwarfs, according to the DwarfArchives.org database) is still too small to provide 
a reliable characterization of the substellar mass function. 

The sensitivity of the IRAC instrument onboard Spitzer, and the characteristics of its photo- 
metric system, raised expectations for a large increase in the number of brown dwarfs (especially 
the cooler T dwarfs) detected using wide area Spitzer surveys. These expectations, however, have 
not yet materialized. Only three T dwarf s have been disco vered by Spitzer: a T4.5 field dwarf in 
the Extragalactic Spitzer Shallow Survey (IStern et al.ll2007l). and two T dwarf companions around 
the nearby young stars HN Peg and HD 3651 (jLuhman et al.ll2007l ). This state of affairs arises from 
the complexity of discriminating brown dwarf candidates from the large number of extragalactic 
red sources that are within the detection limits of IRAC observations. 

The success of photometric searches ultimately depends on the efficiency of the selection 
method required to extract from these large catalogs a manageable sample of sources for spec- 
troscopic follow -up. This selectio n is usually done by apply ing cuts in the color and magnitude 
space (see e.g. ICruz et al.l l2003l and iBurgasser et al.l l2003l and references therein). IStern et al. 
(|2003), in particular, used a single IRAC color cut, [3.6] — [4.5] > 0.4, to select T dwarfs with deep 



CH4 absorption in the 3.6 /im IRAC band, complemented by criteria based on the photometry and 
the morphology of the sources in optical bands. These criteria are not able to discriminate between 
brown dwarfs and high redshit quasars, and are limited to the detection of dwarfs of spectral type 
T3 to T6. The /c-NN method proposed here has been designed to analyze datasets in a multi- 
dimensional color and magnitude space, based only on the distribution of templates without the 
need to define a-priori cuts. Thus it is in principle capable to go beyond these limitations, opening 
the search to L and early-T dwarfs and T dwarfs of type later than T6, without introducing the 
biases associated with the choice of the cuts, and provide a more efficient and complete search. 
This will be especially important during the Spitzer Warm Mission planned from the spring of 
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2009 when, upon exhaustion of the cryogenic LHe, the obs ervatory wih be tasked to conduct larg e 
area surveys using only the IRAC bands at 3.6 and 4.5 //m (jStorrie-Lombardi &: Silbermannll2007l ) . 

In this section we study how to improve on the selection efficiency of Spitzer /IRAC search of 
brown dwarfs using the A;-NN method. We explore the /c-NN parameter space to understand the 
best strategy for these searches, and what are the requirements, in terms of auxiliary data, for their 
success. We also develop techniques that allow to assess the completeness of the result, necessary 
to draw statistically valid conclusions from these searches. 



3.1. Sample Selection 



As test datasets to illustrate our A;-NN se arch for brown d warfs, we use two publicly available 
Spitzer /IRAC w ide field survevs: the X FLS (|Lacv et ahlbood ) and the Bootes field of the IRAC 
Shallow Survey (jEisenhardt et al.l 12004 ). 

The XFLS main field covers an area of 3.8 deg'^ at high galactic latitude, observed to a sen- 
sitivity reaching a 5a Vega magnitude of 18.9, 18.0, 15.7 and 15.1 at 3.6, 4.5, 5.8 and 8.0 ^m 
respectively (obtained with integration times of at least 60 sec per pointing). The XFLS main field 
was chosen for the availability of extensive auxiliary data, including SDSS and 2MASS. The Bootes 
field of the Shallow Survey instead covers an area of 8 deg^ with limiting 5a Vega magnitudes 
of 18.4, 17.7, 15.5 and 14.8 at 3.6, 4.5, 5.8 and 8.0 /um respectively (integration time > 90 sec). 
Deep near-I R J an K,^ data h as been obtained as part of the FLAMINGO Extragalactic Survey 
(FLAMEX, lElston et all bood ) for 7.1 deg ^ of the IRAC field, and in the optical as part of the 
NOAO Deep Wide-Field Survey fNDWFS. [jannuzi fc De^ll999l ). 



The main difference between the two samples (apart for the Shallow Survey being almost 
twice the area of the XFLS), is in the depth of the optical and near-IR ancillary data. While_SDSS 
provi des 5a detection limits of 22.3, 23.3, 23.1, 22.3 and 20.8 in u' , g' , r' , i' and z' (lYork et al 



2000l'). NDWFS h as 5a point source depths of 27.1, 26.1 and 25.4 in Bw, R and / respectively 



Skrutskie et al. 



Elston et al 



Stern et al.ll2007). 2M ASS provides 5a sensitivity in J, H and Kg of 16.6, 15.9 and 15.1 respectively 



20061 ) . while FLAMEX approaches a 5a sensitivity of 21.4 and 19.5 in J and Kg 



200d l. The added depth of the NDWFS and FLAMEX data provides a powerful tool 



to resolve ambiguities between red sources in the IRAC bands of galactic and extragalactic nature. 
By testing our method on both datasets we can measure the efficiency of the brown dwarf search on 
surveys with different depth and assess the auxiliary data requirements necessary to enable effective 
brown dwarf searches during the Spitzer warm mission. 

Figure H] shows the distribution of point sources from the XFLS in a number of IRAC and 
2MASS colors, compared with the distribution of 37 L, 7 early-T (T<4.5) and 22 late-T (T>4.5) 
templates from lPatten et al.l (|2006l ). We chose the colors in the diagrams to maximize the separation 
between brown dwarfs and other galactic (clump near zero colors) and extragalactic (long plume 
with red colors) sources. In particular (see Figured]): (1) the J — [3.6] color is effective in separating 
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the L dwarfs from regular stars, mainly due to H2O absorption in the J band; (2) the Kg — [4.5] 
and [3.6] — [4.5] colors separate the T dwarfs from all other sources (L dwarfs, regular stars and 
extragalactic objects), due to the increasing CH4 absorption in the K and 3.6 /im bands; (3) the 
[4.5] — [5.8] color is useful to select again the T dwarfs, which appear blue because of H2O absorption 
in the 5.8 /im band; and (4) the [3.6] — [8.0] color is also providing a strong separation of the T 
dwarfs due to the methane absorption which is stronger at shorter wavelength than in the reddest 
IRAC band. Color combinations using the H band have a similar discriminatory power than colors 
with the J and Ks photometry: H — [3.6] shows a trend analogous to the J — [3.6] color, and 
H — [4.5] has a very similar color trend than Kg — [4.5]. The J — Kg colors provide a similar 
discrimination than the [3.6] — [4.5] colors, because of CH4 absorption stronger in the Kg than the 
J band. 

The late-T dwarfs appear well separated from any other source, thanks especially to the [3.6] — 
[4.5] IRAC color, even though some contamination persists with red high-redshift quasars having 
PAH emission. The early-T and L dwarfs are however more difficult to discriminate because, once 
dispersion due to photometric errors is taken into account, their color space is very similar to the 
one occupied by low-redshift galaxies and regular s t ars. For this reason, and due to the relative 



small number of early-T dwarfs in the iPatten et al.l (j2006l ) sample, our fc-NN selection is done for 



two separate classes only: one comprising all L and early-T templates, and one with late-T (with 
T>4.5) dwarfs. 

To explore the eff'ect of the presence or absence of individual bands in the selection efficiency, we 
have divided our XFLS and Shallow Survey catalogs in 2 difi^erent subsamples: (1) sources having 
3cJ photometry in J, Kg and all four IRAC bands; (2) sources having 3cr detection in J, Kg, 3.6 
and 4.5 /^m. The first sample is intended to test the effectiveness of the fc-NN method when all IR 
bands are available (the H band has not been considered because of its unavailability in FLAMEX, 
and because brown dwarf colors using the H band are very similar to the colors using J and Kg). 
The second sample is instead designed to simulate the case of the Spitzer Warm Mission, when 
only the two short wavelength IRAC channels will be available, and also to avoid the limitations 
imposed by the less sensitive 5.8 and 8.0 /im bands in currently available surveys. For each sample 
the search is done using a subset of the available colors, avoiding the repetition of similar colors, 
that would dilute the /c-NN metric. The characteristic of the individual subsamples, their size and 
the color combinations used in the /c-NN search are listed in Table [H Optical photometry and 
imaging are not used at this stage, because only a limited number of the brown dwarfs we are using 
as templates have reliable magnitudes at optical wavelengths. The optical data will however be 
crucial to refine the search results later on. 



3.2. Parameter Optimization 



The best values of k and Dth for the search can be determined by using the bootstrap method 
described in section [221 The goal is to optimize the fc-NN parameters in order to have the smallest 
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possible number of candidates that will need follow-up observations while preserving the complete- 
ness of the search. The jackknife method is not suitable in this case because of the very small 
number of available templates in each brown dwarf class. 

The completeness of the search in this case is C = 1 — ?^FAf/^exp^ where n^xp is the number 
of brown dwarfs expected to exist in the dataset and npN is the number of false negatives (i.e. 
true brown dwarfs not identified) in the search. The rejection efficiency can be written as £" = 
1 — upp/ntot where npp/ntot is the fraction of false positives, i.e. the number npp oi incorrectly 
identified brown dwarfs, with respect to the total number of sources ntot in the sample. 

Figure [5] shows the rejection efficiency and completeness for a Monte Carlo test using 100 
randomized samples each with 500 background sources and 500 simulated brown dwarfs, for A; = 3, 
5 and 7. The simulations are made for the subsample using IRAC plus J and Ks bands (for both 
L/early-T and late-T searches), and for Spitzer warm mission colors (as described in Table [T]). The 
figure shows the resulting rejection efficiency and completeness for the XFLS; similar results are 
obtained for the Shallow Survey. 

The rejection efficiency curve is a decreasing function of Dth because when sources with larger 
/c-NN distances are selected it is more likely to include false positives in the candidates. On the 
other hand, for smaller Dth more true brown dwarfs are missed, leading to a smaller completeness. 
We have adopted the Dth for which the two curves cross. The values of Dth at the £ and C crossing 
point for K = 3, 5 and 7 are listed in Table [2] for simulations of the XFLS and Shallow Survey 
search subsamples. Note that Dth tends to be smaller for L/early-T than for late-T searches. This 
is because (as shown in Figure [5|) the efficiency £ of L/early-T searches drops faster with Dth than 
in late-T searches, due to the IRAC colors of L/early-T dwarfs being relatively similar to the colors 
of regular late spectral type stars and low redshift galaxies (see Figure H]): in L/early-T searches 
even a small increase in Dth results in a large contamination of background sources and thus in a 
fast drop of the efficiency As a result, in L/early-T searches the crossing point £ = C occurs for 
smaller values of Dth than in late-T searches where, thanks to the unique colors of late-T dwarfs, 
the efficiency drops slowly function of Dth- 

Table [2] shows that the search of late-T dwarfs using IRAC and near-IR colors combined 
reaches high level of completeness and rejection efficiency, up to 99.9%. Even if only the two short 
wavelength IRAC bands are used (as will be the case in the warm mission), £ and C are still 
similarly high. Note however that the completeness and rejection efficiency of the warm mission 
search tend to be smaller when IRAC photometry is combined with the deeper FLAMEX dataset, 
rather then the shallower 2MASS, due to the higher number of red extragalactic sources cross- 
correlated with the IRAC catalog. The L/early-T search is less efficient than the late-T search, 
with C and £ ~ 90%, due to the higher contamination of this sample with background sources with 
similar colors. 

Rejection efficiency and completeness are generally higher for small k, because in that case 
the selection region follows more closely the distribution of the templates. Using small k, however. 
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puts us at risk of depending critically on outliers in the template class. We adopted k = 5 for 
the searches done using all IRAC bands. For the search in "warm mission" condition, however, we 
adopted A; = 3 to have the maximum possible efficiency, given the larger size of the initial catalog. 



3.3. fc-NN Search Result and Optical Validation 

The results of the A;-NN search for brown dwarfs are presented in Table [3l The table shows, for 
each subsample of the XFLS and Shallow Survey, the number N^^^ of selected sources. The actual 
rejection efficiency £' = 1 — N^^^ /Nt^t (where Nt^t is the number of sources in each subsample) 
is also shown. Comparison with Table [2] shows that £' is very similar to the rejection efficiency £ 
predicted by the bootstrap method Monte Carlo simulations. 

Table [3] shows that, when all IRAC bands are available, the A;-NN method alone is efficient 
enough to reduce the number of possible candidates to a size small enough to allow visual inspection 
of the candidates. As only the /c-NN method is used, the selected candidates are not biased by the 
choice of arbitrary cuts, depending only on a metric with known completeness, normalized on the 
statistical uncertainties of both data and templates. Optical data are however available, and can 
be used to further reduce the candidate sample, which is still necessary in the case of L/early-T 
dwarf searches, and when requiring only 3.6 and 4.5 ^m photometry (resulting in a much larger 
catalog) . 

The low temperature of brown dwarfs requires their optical colors to be very red, in contrast 
with the cas e of extragala c tic ob iects that tends to have a flatter optical SED. In particular. 



according to ILeggett et al.l ([20001), L dwarfs have f — z' > 1.6 in th e SDSS photometric system 



while T dwarfs have i' — z' > 3 .0. Accordi n g to Dahn et al.l (j2002l ) L and early-T dwarfs have 



R — I > 2.0 while according to IStern et al.l (|2007l ) late-T dwarfs must have R — I > 2.5 in the 



NDWFS bands. These criteria can be applied to all selected sources, as long as their i' and z' 
magnitudes (for sources in the SDSS) or R and / magnitudes (for NDWFS sources) are known. 
Sources missing optical photometry can still be considered as brown dwarf candidates, given that 
their absence from the optical catalog may be an indication of very red colors. These colors cannot 
be introduced directly in the /c-NN metric because good photometry in the optical bands is missing 
for a significant fraction of our templates. By using the optical bands in the form of "cuts" we 
are introducing a bias associated to the choice of the cut. However this is still more efficient that 
doing the selection using cuts alone, since the candidates have been pre-selected in an unbiased 
mode using all the other bands with the /c-NN metric. This allows us to adopt a less stringent 
optical color criteria while still preserving very high efficiency in the selection. It also allows to 
retain the candidates selected with the /c-NN method that are missing optical data (the potentially 
coolest late-T candidates), which would be eliminated by default if color cuts were the only selection 
criteria. 



The application of the criteria described above reduces drastically the number of viable candi- 
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dates. The remaining candidates can then be checked visually from the SDSS and NDWFS plates, 
to eliminate all sources that appear extended, that are blended with other sources, or that are 
corrupted by artifacts in the images. The remaining candidates are listed in Table S] and [5j The 
number of viable candidates after the optical criteria are applied is listed in Table [3l separately 
for the N°^l sources that possess optical detection, and the A^^° that are not detected in the 
optical surveys. 

Note that when all IRAC, J and Kg photometry is known, cross-checking with optical data 
reduces the number of late-T dwarf candidates to zero for the XFLS, and to only one for the 
Shallow Survey. The lone T dwarf select e d for the Shallow Survey is in fact the T4.5 dwarf IRAC 



J142950.8+333011 found bv IStern et al.l (|2007l ) in their search. This result shows that the /c-NN 
method, when used on near-IR and Spitzer data, in combination with deep optical photometry is 
capable to select true late-T dwarfs while rejecting all other sources of differ ent nature. In fac t, the 



method here described effectively rejected the second red source found by lStern et al.l 120071 color 
cuts, the z = 6.12 radio-loud quasar IRAC J142738. 5-^331242. 

Figure [6] shows the Kg — [4.5] and [3.6] — [4.5] colors of all the candidates selected in both 
surveys. Of the plotted sources, 5 have colors at odd with the brown dwarf templates. These 
sources, most likely red background galaxies, have identical colors than T dwarfs in all bands, 
with the exception of the two plotted here. The fact that they have been selected by the fc-NN 
method is an example of "variable dilution" in the metric, as described in section 12. 2[ To refine 
the selection we can apply a second time the fc-NN method, using only this pair of variables, the 
K — [4.5] and [3.6] — [4.5] colors. The region corresponding to k = 3 and Dth = 0.8 (providing 
completeness C > 99.9% in these two variables) is plotted in Figure [6l and confirms that these 
sources are outliers. We flagged them as such in Table [5l 

After these anomalies are taken into account, we are left with 1 viable L/early-T and no viable 
late-T candidates in the XFLS (Tabled]), and 3 viable L/early-T candidates and 1 late-T dwarf (the 
T4.5 dwarf J142950.5-F333011) in the Shallow Survey (Table [5]). The L/early-T viable candidates 
need follow-up observations (currently in progress) to determine unambiguously their nature. 



4. Searching Low Mass Companions around Nearby Stars 



As a second example illustrating the case of A;-NN metric using not only colors but also absolute 
magnitudes, we show the case of the search of low mass companions around nearby stars. This 
search was conducted as part of the IRAC Guaranteed Time Observations (P.I. Giovanni Fazio) 
programs PID 33, 34, 35, 36, and 48 (jPatten et al.ll2005l ). The survey focused on 400 stars within 
30 pc from the Sun, among which all stars and brown dwarfs within 5 pc known to date. The sample 
included young stars with ag e < 120 Myr, stars with known radial velocity discovered exoplanets, 
and the L and T dwarfs from lPatten et al.l (|2006l ) used here as templates. Each star was imaged to 
a depth of of ~ 150 sec in all IRAC bands, with a field of view of ~ 5 arcmin, sufficient to detect 
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companions at a distance of ~ 50 to 4,000 AU from the primary, to a limiting mass of ^ 10-20 Mj. 
The search ha s yield ed the discovery of two new T dwarfs from the whole sample, presented in 



Luhman et alj (|2007l ). All the observations were made in a single epoch, preventing the search of 
companions by virtue of their common proper motion with the primary (except for the few cases in 
which a deep near-IR image was available) . The candidate selection was instead based on the fc-NN 
method described below. While a paper analyzing the search results is in preparation (Carson et 
al., 2009), we want here to discuss the efficiency of the A;-NN method in this particular case of 
brown dwarf search. 

Figure [7] shows simulated [4.5] vs. [3.6] — [4.5] photometry of L, early-T and late-T dwarfs 
around a nearby star at 5, 10 and 20 pc. To simulate the photometry of background sources 
around the primary star, we have used the XFLS catalog, which is an adequate representation of 
a background field projected at high galactic latitude. For nearby stars projected closer to the 
plane of the Galaxy, the proportion of extragalactic/galactic sources will be smaller, reducing the 
contamination from red extragalactic sources, while contamination from red galactic sources is likely 
to be increased. The latter (mass losing evolved stars and young stellar objects) may however be 
discriminated by means of auxiliary infrared observations capable of detecting the thermal emission 
from their circumstellar dust. Figure [7] also shows /c-NN regions plotted for k = 5 and Df^ = 1- By 
adding the [4.5] magnitude to the A;-NN variables the selection is in principle improved because a 
large fraction of the red background extragalactic sources are fainter than the expected brightness 
of brown dwarf companions. This is especially true for T dwarfs around stars within 5-10 pc from 
the Sun. This advantage is reduced for searches around farther stars, since the brightness of T 
dwarf companions at (i ~ 20 pc is the same of the extragalactic background. 

We have estimated the rejection efficiency and completeness of this search using the same 
bootstrap method described in section 12.21 Table [6] shows that for early-T dwarfs the selection 
efficiency reaches very high values (up to 99% for primaries at 5 pc). For late-T dwarfs the rejection 
efficiency is approximately the same that is obtained by using only colors as selection criteria, and 
the inclusion of absolute magnitudes does not result in a dramatic improvement in the search 
effectiveness (suggesting that the fc-NN color selection is already close to maximum efficiency). 
The £ and C obtained in the simulations, are adequate for this kind of search: the typical number 
of sources in the ~ 400 stars part of the IRAC companion search program had an average of ~ 100 
background sources each. With this efficiency, for each field the chosen /c-NN criteria selected up 
to 3 candidates, many of them actually present in at least one 2MASS map, and could be ruled 
out either by the absence of proper motion, or because they did not possess the correct near-IR 
colors. The few remaining candidates were followed-up spectroscopically and by acquiring deep 
ne ar-IR images, resultin g in the two new T dwarfs found around HN Peg and HD 3651, presented 



m 



Luhman et al.l (|2007l ) 
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Discussion and Conclusions 



Based on the photometry given by iPatten et al.l (j2006l ). the sensitivity of the XFLS and the 



Shallow survey in the IRAC bands allows for the detection of late-T dwarfs (T4 to T8 spectral 
type) up to a distance of ~ 20 pc. This limit is determined mainly by the lower sensitivity of the 
5.8 and 8.0 fiui bands, for the latest spectral types. If only the 3.6 and 4.5 fim bands are used, as in 
the Spitzer warm mission, the higher sensitivity allows detection of late-T dwarfs up to a distance 
of ~ 50 pc. The detection limit at the L-T boundary is ~ 60 pc (~ 140 pc if only the 3.6 and 
4.5 /im bands are required). Within this volume (corrected for the Malmquist bias), we can expect 
a brown dwarf search to be as complete as C > 98%, as estimated in section [3^2] (multiplied by the 
completeness of the original survey, and corrected for binarity). 

Searches using 2MASS photometry, however, will have more stringent limits, of < 5 pc for 
late-T dwarfs and ~ 25 pc at the L-T boundary. The sensitivity of the FLAMEX survey is such 
that any L or T dwarf detected in the IRAC 3.6 and 4.5 /xm bands will also be detected in the J 
band, even though the lower sensitivity of the K band limits the maximum detection distance for 
a T8 dwarf to ~ 32 pc. The very small number of brown dwarf candidates that are not optically 
detected in at least the / band shows that the depth of the NDWFS is not a significant constraint 
in the search of brown dwarfs. The main limitation rather comes from the depth of the IRAC data. 

These considerations come into play to understand the potential for brown dwarf searches in 
the recently approved Exploration Science surveys in the Spitzer warm mission. The requirements 
for brown dwarf detection are clearly a large survey area, deep observations and the availability of 
matching near-IR and possibly optical data. Of the approved warm mission programs, three satisfy 
these requirements: the "Spiteer Extragalactic Representative Volume Survey (SERVS)" (PI Mark 
Lacy, PID 60024), the ''Spitzer Extended Deep Survey (SEDS)" (PI Giovanni Fazio, PID 60022) 
and the "GLIMPSE360: Completing the Spitzer Galactic Plane Survey" (PI Barbara Whitney, 
PID 60020). 

Our analysis shows that when near-IR data of sufficient depth are available (as in the case of 
the FLAMEX survey) , the search for late-T dwarfs using the photometric A;-NN technique described 
in section [3] is extremely efficient and complete (more than 99.8% £ and C). Once a single I — R ox 
i' — z' optical color is applied, the final selection produces a very small number of viable candidates 
to be checked individually (4 L/early-T and 1 late-T viable candidates). It is worth noting that 
the only late-T d warf candidate selected in our search is indeed a T4.5 dwarf, as discovered by 



Stern et al.l (|2007l ). With C ~ 98% completeness we can assert that this is the only late-T dwarf 
present in the volume of the survey (7.1 deg^ for the FLAMEX field with a depth of ~ 32 pc), 
even when only the two short wavelength IRAC bands are considered. This number i s con sistent 



with the results from the T dwarf UKIDSS DR2 Large Area Survey (jPinfield et al.ll2008l ). that 
estimates 17 it 4 late-T dwarfs in an area of 280 deg^ for a depth of X ~ 18.2 (corresponding to 
a maximum distance of ^ 18 pc for T8 dwarfs). By factorizing the search volume between the 
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two survey^ one predicts ~ 2-3 late-T dwarfs in the volume of the Shahow Survey analyzed in 
this paper. This is consistent with our result (1 viable candidate in the FLAMEX search), upon 
considering Poisson statistics. The number of L/early-T candidates that we found shows that, with 
C ~ 85% completeness, < 3 L/early-T dwarfs are present in the Shallow Survey searchable volume 
(8 deg^ X 60 pc). 

These numbers can be used to estimate the potential yield of brown dwarfs in the approved 
warm Spitzer Exploration Science surveys. The most promising Spitzer warm mission project for 
the application of the /c-NN method here described is the SERVS survey. With a total area of 
18 deg^, and a total exposure time of 600 sec per pointing, it will probe a brown dwarf volume 
almost 9 times larger than the IRAC Shallow Survey (search depth of ~ 80 pc for late-T dwarf J^. 
A large fraction of the survey will overlap with the VIDEO VISTA surve}{^, providing a depth of 
25.7, 24.6, 24.5, 24.0 and 23.5 mags in the z, Y, J, H and K bands, more than matching the depth 
of SERVS in the IRAC 3.6 and 4.5 //m bands. Based on these sensitivity, SERVS may find as many 
as ~ 27 late-T dwarfs and a large number of L and early-T dwarfs. 

The SEDS program, on the other hand, has a much smaller survey area (~ 0.9 deg^) but a 
much longer integration time (12 hours per pointing). This scales down to a searchable volume of 
~ 12 times the total volume of the Shallow Survey (search depth of 230 pc for late-T dwarfs). 
This survey can potentially provide as many as ~ 36 late-T dwarfs, even though a decrease in the 
brown dwarf density should be expected as the survey probes farther distances from the galactic 
mid-plane. This survey, however, may be limited by the challenge of finding ancillary optical and 
near-IR data matching the depth of the IRAC photometry. While this may limit the effective 
late-T dwarf searchable volume, the potential search depth offered by the IRAC data, in a high 
galactic latitude region, will provide an important test for the vertical distribution of the brown 
dwarf population in the Galaxy. 

The GLIMPSE360 survey, instead, compensates the rather shallow coverage (36 sec integration 
time for each pointing) with a very larg e survey area (187 de g^). A substantial portion of this area 



is also covered by the UKIDSS survey (jLawrence et al.l 120071 ). implying that ~ 11 of the 17 late-T 



dwarfs estimated for the whole UKIDSS may be present in th e GLIMPSE360 a rea. According 



to the GLIMPSE360 consortium, more detailed simulations by iBurgasserl (j2004l ) predict 70 TO, 
^ 100 T5 and ~ 15-20 T8 dwarfs in the survey search area. The challenge will be to isolate these 
brown dwarfs from the high-confusion galactic field, and distinguish them from other red galactic 
sources. The A;-NN method can play an important role for this task. 

The main advantage of the fc-NN method presented in this paper is that it allows to perform 



W2/V1 oc 0,2/^1 ■ (d2/dl)^ where ^1,2 are the survey areas and di,2 their search distance limit 

^Search depth d scales as the limiting flux F^/^, in turn scaling as the signal-to-noise ratio S/N ~ t^^'^ , where t is 
the exposure time. This gives d ~ t^^"^ and thus V2/V1 0^0,2/^1 ■ {^2/^1)^^^ 

^ ht t p : / / www . vist a . ac . uk/index . ht ml 
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photometric searches using a large number of color and magnitude variables, defining complex 
regions that closely follow the distribution of the sources to be selected. While similar regions can 
be defined manually, our method prevents the introduction of biases in the selection due to the 
choice of the cuts. With the fc-NN method the selection regions in the color /magnitude space are 
only based on the photometric properties of the search class and the statistical uncertainties of the 
data. Also, the A;-NN search can be controlled by just two parameters (the number of neighbors k 
and the threshold distance Dth), instead of many arbitrary cuts, which allows to quickly experiment 
different combination of variables, and optimize the search for maximum rejection efficiency and 
completeness. 

The examples presented in this paper show that the fc-NN method is an effective procedure 
for the search of field and companion brown dwarfs in Spitzer wide field surveys. This can be an 
important asset for the Spitzer warm mission surveys. As these surveys will image areas of the sky 
where deep photometric catalogs in the optical and near-IR are already available, or in progress, the 
fc-NN method can effectively select T dwarf candidates, potentially leading to a significant increase 
in the known number of members in this class. This is by no means the only potential application 
for this method. The method is general enough to allow the photometric selection of sources of 
any kind, as long as a sample of templates is available. If enough classes of templates are used, 
the fc-NN method can be the engine for the photometric classification of all sources in a survey, by 
attributing to each source the class with the highest A;-NN score. We ar e currently applying this 



method of classification to the point source catalog of the SAGE survey ([Meixner et al.ll2006l ). 
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Table 1. Sample selection 



Sample N,,,, A--NN Colors Used 



XFLS IRAC/2MASS 4,552 [3.6] - [4.5], [3.6] - [8.0], [4.5] - [5.8], J - [3.6], K, - [4.5] 

XFLS IRAC Warm 8,133 [3.6] - [4.5], J - [3.6], A', - [4.5], J - Ks 

Shallow Survey IRAC/FLAMEX 15,847 [3.6] - [4.5], [3.6] - [8.0], [4.5] - [5.8], J - [3.6], Ks - [4.5] 

Shallow Survey IRAC Warm 71,590 [3.6] - [4.5], J - [3.6], - [4.5], J - 
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Table 2. A;-NN efficiency and completeness optimization 



L/early-T 


late-T 


late-T (• 


warm mission) 


Dth 


C = £ 


Dth 


C = £ 


Dth 


C = £ 


First Look Survey 












fc = 3 0.62 


93.1% 


0.72 


99.7% 


0.68 


99.3% 


fc = 5 0.74 


90.3% 


0.89 


98.9% 


0.87 


98.3% 


fc = 7 0.83 


87.7% 


1.05 


97.5% 


1.06 


95.9% 


Shallow Survey & FLAMEX 










fc = 3 0.56 


89.7% 


0.76 


99.9% 


0.63 


97.8% 


fc = 5 0.68 


85.2% 


0.97 


99.8% 


0.81 


95.3% 


fc = 7 0.76 


83.3% 


1.17 


99.4% 


0.99 


92.2% 
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Table 3. fc-NN search result 



k A. Ntot N^r Kit Ker' 



First Look Survey 
L/early-T 5 
late-T 5 
late-T (warm) 3 



0.74 4,552 455 
0.89 4,552 45 
0.68 8,133 17 



90.0% 


1 


2 


99.0% 








99.8% 









Shallow Survey & FLAMEX 

L/early-T 5 0.68 15,847 2,831 82.1% 3 1 

late-T 5 0.97 15,847 40 99.7% 1 

late-T (warm) 3 0.63 71,590 1,582 97.8% 4 1 



Table 4. XFLS brown dwarf candidates 



# 


RA 


Dec 


J 


H 


K 


[3.6] 


[4.5] 


[5.8] 


[8.0] 


Type 


Notes 


1 


260.38831 


+59.27060 


16.36 


15.32 


14.70 


14.33 


14.36 


14.18 


13.93 


L/oarly-T 


red star? 


2 


261.10120 


+60.03591 


15.74 


15.06 


14.77 


13.96 


13.97 


14.03 


14.00 


L/early-T 




3 


261.13129 


+60.05562 


16.59 


16.05 


15.20 


14.59 


14.42 


14.52 


14.36 


L/early-T 


galaxy? 



Table 5. Shallow Survey brown dwarf candidates 



# 


RA [deg] 


Dec [deg] 


Bw 


R 


I 


J 


K 


[3.6] 


[4.5] 


[5.8] 


[8.0] 


Type 


Notes 


1 


216.278002 


+34.355660 






23.48 


19.53 


19.07 


17.63 


17.34 






late-T 


red galaxy? 


2 


216.603543 


+34.140991 






23.51 


20.59 


19.41 


18.57 


17.74 






late-T 


red galaxy? 


3 


217.032547 


+34.098458 








20.32 


19.16 


18.07 


17.58 


16.64 




late-T 


red galaxy? 


4 


217.462015 


+33.503213 






22.21 


16.88 


16.99 


15.70 


15.12 


15.21 


14.59 


T4.5 


J142950.9+333011 


5 


217.786508 


+33.139283 




22.75 


20.40 


17.38 


16.30 


15.80 


15.69 


15.35 


15.12 


L/early-T 




6 


217.920577 


+33.295865 


27.24 


22.30 


20.04 


17.42 


16.41 


15.79 


15.69 


15.52 


15.88 


L/early-T 




7 


218.001634 


+33.925557 




26.00 


22.33 


19.29 


18.63 


17.70 


17.22 


16.32 




latc-T 


rod galaxy? 


8 


218.003005 


+33.949375 




23.89 


21.35 


18.89 


18.80 


17.79 


17.46 


16.90 


16.07 


late-T 


red galaxy? 


9 


218.303091 


+34.477201 


26.43 


21.69 


19.23 


16.33 


15.20 


14.58 


14.69 


14.47 


14.42 


L/early-T 




10 


218.335896 


+33.850797 








20.25 


18.86 


17.12 


17.01 


16.40 


15.21 


L/early-T 


bad I photometry? 



Table 6. Companion search efficiency and completeness 



early-T late-T 
Dth CSzS Dth CSzS 



d = 5 pc 0.52 99.0% 0.57 97.2% 

d=10pc 0.49 97.3% 0.55 97.1% 

d = 20pc 0.42 89.5% 0.52 96.3% 
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w - 



o 




Fig. 1. — Model spectra of brown dwarfs with Tpff ~ 18 K (L dwarf), ~ 1300 K (L-T dwarf 
transition) and ~ 800 K (late T dwarf) from burrows et all (|2006l ). The IRAC and 2MASS spectral 
band-passes are marked, as well as the main molecular absorption features in the near- and mid-IR 
range. 
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Fig. 2. — Effect of averaging in the metric. Assume that dx and dy are the distances in the two 
coordinates x and y. If the euclidean metric is adopted, the points in the x, y space with distance 
less than dx and dy in the two coordinates are included in the inner ellipse. If instead the euclidean 
metric is averaged in the two coordinates, any point within the larger ellipse will still have distance 
components less than dx and dy. The point P, having individual distances from the center C less 
than dx and dy will be excluded by the smaller ellipse but still included by the ellipse defined by 
the euclidean average metric. 
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Color X 



Fig. 3. — fc-NN regions around their templates in a two-dimensional color-color diagram. The 
error bars for each template represent the total statistical error in equation [2l The solid line is 
the /c-NNregion for k = 1 and no systematic error (Ts{j)'- note how the contour encloses the union 
of the individual regions represented in Figure [2] for Davg = 1- The two templates at the bottom 
are separated enough from the others to form a disconnected region. When the sparseness of the 
templates (Js{j) is taken into account, a single region emerges (dotted line). For k = 6 the region 
contour becomes smoother, as shown by the dashed line. Note that for fc = 6 the region around the 
isolated sources becomes wider, since crs{j) is calculated reaching templates from the other group. 
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12 1 

[3.8]-[4.5] [3.6]-[4.5] 




Fig. 4. — Grey points: Spitzer First Look Survey sources with good photo metry (S/N > 3) in 6 
bands (IRAC bands, plus J and K from the 2MASS survey). Triangles are lPatten et al.l (j2006l ) L 
dwarfs, diamonds are early T dwarfs (spectral type T < 4) and squares are late T dwarfs (spectral 
type T > 4). Contours are the /c-NN regions defined for k = 5 and D^^jsi = 1 for L dwarfs (solid 
line), early T dwarfs (dashed line) and late T dwarfs (dotted line). 
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kNN distance kNN distance kNN distance 




kNN distance kNN distance kNN distance 




kNN distance kNN distance kNN distance 



Fig. 5. — Predicted efficiency (solid line) and completeness (dashed line) of the L/early-T and late 

T search as a function of the /c-NN distance threshold, for k = 3, 5 and 7 (using all colors or IRAC 
only colors). The prediction is based on a Monte Carlo simulation seeded with a 20% subsamplc 
of the First Look Survey database. The large dots are the actual fractions of sources rejected from 
the full datasets for different values of Dth • 
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First Look Survey/2MASS/SDSS 



[3.6]-[4.5] 




IRAC J142950. 9 + 333011 



Shallow Survey/FLAMEX/NDWFS 



[3.6]-[4.5] 



Fig. 6. — Best brown dwarf candidates for the First Look Survey (top) and the Shallow Sur- 
vey/FLAMEX (bottom), selected using J, K and IRAC colors, satisfying the optical color require- 
ment. Circles are L/early-T candidates and squares late-T candidates. Large symbols have been 
verified visually in the SDSS or NDWFS I band images to ensure they are single point sources 

and not extended sources or blends. Small symbols are not detected in the optical surveys. Grey 
symbols are the brown dwarf templates. The solid line in the bottom panel shows the fc-NN region 
drawn for /c = 3 and Dt/j = 0.8 using only the two variables in the plot {K — [4.5] and [3.6] — [4.5]). 
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Fig. 7. — Color-magnitude diagram of simulated photometry around a nearby star. The data points 
are from the XFLS. The L, early T and late T template [4.5] magnitudes are computed for brown 
dwarfs at a distance of 5 (left), 10 (center) and 20 pc (right) respectively. The regions are plotted 
for k = 5. Symbols are the same as in previous plots. 



